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Abstract 

Background: Our previous study on ripe apples from a progeny of a cross between the apple cultivars 'Prima' and 
'Fiesta' showed a hotspot of mQTLs for phenolic compounds at the top of LG16, both in peel and in flesh tissues. 
In order to find the underlying gene(s) of this mQTL hotspot, we investigated the expression profiles of structural 
and putative transcription factor genes of the phenylpropanoid and flavonoid pathways during different stages of 
fruit development in progeny genotypes. 

Results: Only the structural gene leucoanthocyanidin reductase {MdLARl) showed a significant correlation between 
transcript abundance and content of metabolites that mapped on the mQTL hotspot. This gene is located on LG16 
in the mQTL hotspot. Progeny that had inherited one or two copies of the dominant MdLARl alleles {Mm, MM) 
showed a 4.4- and 11.8-fold higher expression level of MdLARl respectively, compared to the progeny that had 
inherited the recessive alleles {mm). This higher expression was associated with a four-fold increase of procyanidin 
dimer II as one representative metabolite that mapped in the mQTL hotspot. Although expression level of several 
structural genes were correlated with expression of other structural genes and with some MYB and bHLH 
transcription factor genes, only expression of MdLARl was correlated with metabolites that mapped at the mQTL 
hotspot. MdLARl is the only candidate gene that can explain the mQTL for procyanidins and flavan-3-ols. However, 
mQTLs for other phenylpropanoids such as phenolic esters, dihydrochalcones and flavonols, that appear to map at 
the same locus, have so far not been considered to be dependent on LAR, as their biosynthesis does not involve 
LAR activity. An explanation for this phenomenon is discussed. 

Conclusions: Transcript abundances and genomic positions indicate that the mQTL hotspot for phenolic 
compounds at the top of LG16 is controlled by the MdLARl gene. The dominant allele of the MdLARl gene, 
causing increased content of metabolites that are potentially health beneficial, could be used in marker assisted 
selection of current apple breeding programs and for cisgenesis. 
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Background 

Apple (Malus x domestica Borkh) is an important 
source of many secondary metabolites known as phen- 
olic compounds [1,2]. These phenolic compounds have 
various functions in the plant such as protection against 
ultra violet light [3]. The phenolic compounds such as 
procyanidins are polymers of flavan-3-ols. In plants they 
often function to prevent herbivory. They provide an 
astringent taste to foodstuffs and, at longer chain length, 
form complexes with proteins. Procyanidins are increas- 
ingly recognized for their beneficial effects on human 
health [4]. 

One of the important benefits of these compounds to 
consumers is their potential role against various human 
diseases such as cancer, coronary heart diseases, cardio- 
vascular diseases, and diabetes [5,6]. 

Phenolic compounds are synthesised through the phe- 
nylpropanoid and flavonoid pathways. For procyanidins, 
the biosynthetic pathway largely overlaps with that of 
anthocyanins. These complex biochemical pathways 
involve a series of enzymes. Many of these enzymes, as 
well as the encoding genes have been functionally 
characterized [7-10]. The first committed step to procya- 
nidins has been postulated to be carried out by leucocya- 
nidin reductase (LAR) [11]. 

In our previous study [1] we genetically mapped phen- 
olic compounds that were detected in peel and in flesh 
of ripe apple fruits. We detected a hotspot of QTLs of 
metabolites (mQTLs) at the top of LG16. The metabo- 
lites that mapped at this locus were procyanidins 
(flavan-3-ols and their polymers), and other phenolic 
compounds such as phenolic esters and flavonol- and 
dihydrochalcone derivatives. All these compounds be- 
long to the phenylpropanoids, and one could therefore 
speculate that the mQTL is controlled by a biosynthetic 
gene from the phenylpropanoid pathway, or by a tran- 
scription factor controlling this pathway. 

The aim of the present study was to unravel which 
gene controlled the phenylpropanoid mQTL hotspot in 
apple. The approach used involved an expression ana- 
lysis of structural and transcription factor genes of the 
phenylpropanoid and flavonoid pathway. By looking 
closer at the draft sequence of the whole genome of the 
apple cultivar 'Golden Delicious' [12], the structural gene 
leucoanthocyanidin reductase (MdLARl) and seven tran- 
scription factor genes were detected in the genetic win- 
dow of the mQTL hotspot. Therefore the transcript 
abundances of these genes were investigated. In addition, 
expression profiles of the structural genes of the phenyl- 
propanoid and flavonoid pathways outside of the mQTL 
hotspot were studied. A strong positive correlation be- 
tween the expression level of the MdLARl gene and the 
level of metabolites that mapped at LG16 was observed. 
This was not found for any of the other genes studied. 



This indicates that the MdLARl gene is the major candi- 
date gene controlling the mQTL hotspot on LG16. Fur- 
ther evidence is provided by the fact that the MdLARl 
gene is the only structural gene of the phenylpropanoid 
and flavonoid pathways that resides in the mQTL hotspot. 

Methods 

In this study, fruits from the segregating Fl population 
derived from the cross between the cultivars 'Prima and 
'Fiesta were used. This population was used in our pre- 
vious study too, in which the mQTL hotspot and other 
mQTLs were detected [1]. 

Selection of genotypes and harvesting of fruits for gene 
expression studies 

We selected genotypes based on the clear genetic segre- 
gation of metabolite procyanidin dimer IF (Additional 
file 1). This metabolite belongs to the phenylpropanoid 
pathway and its concentration showed a clear segrega- 
tion (Figure 1). It was mapped at the mQTL hotspot on 
LG16 as described by Khan et al. [1]. This metabolite 
was used as representative metabolite for all other meta- 
bolites that mapped at the mQTL hotspot. The progeny 
genotypes from the cross 'Prima x 'Fiesta were divided 
into two groups based on 'procyanidin dimer IF clear 
segregation, i.e. one group (Group A) having low content 
and another group (Group B) having high content of 
'procyanidin dimer IT (Additional file 2). The trees were 
at full bloom from 26-30 th April, 2010. Fruits from trees 
in the trial orchard located in Randwijk, the Netherlands 
were harvested at three developmental stages (Figure 2), 
eight fruits per tree for each developmental stage, and subse- 
quently peeled off and were stored as described in our 
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Figure 1 The procyanidin dimer II contents (ion counts per 
second) of the ripe fruits of the genotypes that were used for 
studying gene expression during fruit development [1]. This 
metabolite segregated clearly in the F1 progeny in both peel and 
flesh, and represents the metabolites that mapped in the mQTL 
hotspot on LG16. The genotype classes are mm (homozygous 
recessive), Mm (heterozygous dominant) and MM (homozygous 
dominant). 
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Figure 2 The three developmental stages of growth of apple fruit used for gene expression in this study. A= 34 Days After Full Bloom 
(DAFB), B= 60 DAFB and C= 95 DAFB. 



previous article [1]. Ten genotypes were selected from 
'Group A' and nine genotypes were selected from 'Group B' 
with two trees per genotype (two biological replicates; 
Additional file 1). As references, the parents of the segregat- 
ing Fl population were included in the analysis with two 
trees for each parent. Both parents belong to the heterozy- 
gous class of genotypes. The genotype numbers and geno- 
type classes used in this study are given in Additional file 1. 
The sizes (diameter) of individual fruits were measured at 
each developmental stage using an electronic digital caliper, 
model VWRI8 19-0012 of Control Company. The average 
size at 34 Days After Full Bloom (DAFB) was 24 mm, at 60 
DAFB 40 mm, and at 95 DAFB 62 mm (Figure 2; Additional 
file 1). 

There were three classes of genotypes based on co- 
segregating genetic markers: MM was the homozygous 
dominant class. These progeny inherited from each parent 
one dominant allele for increased content of the procyani- 
din dimer II. Mm was the heterozygous class, which has 
one dominant allele from one parent and one recessive 
allele from the other parent. The heterozygous progeny 
had high content of the metabolite too. The third class is 
the homozygous recessive class mm. This class received 
both recessive alleles from the two parents, and showed a 
low content of the metabolite. 

RNA isolation from apple fruits 

Total RNA was isolated from peel and flesh of apple 
fruits separately according to the CTAB method 
described by Asif et al. [13]. The RNA quantity was 



measured on NanoDrop R spectrophotometer model 
ND-1000 from isogen lifescience scientific company as 
explained by Khan et al. [14] and the RNA quality and 
quantity were measured by running 2 ul of the RNA 
sample on a 1.5% agarose gel. First single-strand comple- 
mentary DNA (cDNA) was synthesized using iScript™ 
cDNA Synthesis Kit (Bio-Rad) according to the manu- 
facturers manual. 

Selection of genes for qRT-PCR studies and primer design 

The MdLARl gene was detected in the middle of the 
genetic window for the mQTL hotspot at the top of 
LG16 [1]. Therefore this structural gene was included in 
the gene expression study. In addition, the other struc- 
tural genes of the phenylpropanoid and flavonoid path- 
ways were included (Table 1). Further more, all putative 
transcription factor genes that were located within the 
genetic window of the mQTL hotspot were included. 
Also we added putative transcription factor genes which 
neighboured this genetic window using the 'Golden 
Delicious' genome sequence [12], or which showed high 
homology to genes that are known to regulate the phe- 
nylpropanoid and flavonoid pathways in other plant spe- 
cies (Table 1). Primer pairs for structural genes of the 
phenylpropanoid and flavonoid pathways were kindly 
provided by Plant and Food Research, New Zealand. The 
primer pairs for the other candidate genes were designed 
with the online available program 'PrimerSPlus' (http:// 
www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus. 
cgi). The primer names and their forward and reverse 



Table 1 Genes that were included in the expression analysis 



Group 


Gene name 


Gene ID 


Forward primer (5'— ► 3') 


Reverse primer (5'— ► 3') 


LG on "Golden Delicious" 


Gene position on LG (kbp) 


Structural 


MdLARlpairl 


MDP0000171928 


GTGGTOACGGAGGCACAGT 


CCGAGGAGAAAGGACTACCC 


LG16 


1536 


CJgHeS 


MdLAR1pair2 


AY830131 


GTGOTCGATGGCmOTC 


TAACAAGCTCACCCCCAAAC 


LG16 


1530 




MdLAR2 


AY830132 


ATGCCACAATCGTGTCAAAA 


GGCTGGC1TCAGCTACAAAC 


LG13 


2860 




MdPAL 


ES790093 


CGAGGAGTGTGACAAGGTGTOCA 


AGGAATGCAGCATGTAAACCGTGAC 


LG4 


8075 




MdC4H 


EB135197 


GGACGmAGTCCAGAAOTCGAGCT 


AOTCATCACAATGGTGGAATGCTO 


LG11 


4614 




Md4CL 


EB1 22629 


CATAAACAGTGTCCCCAAGTCAGCAT 


AGTGTOCTACAAGCOTCCCGATAA 


LG11 


5126 




MdCHS 


CN944824 


GGAGACAACTGGAGAAGGACTGGAA 


CGACATOATACTGGTGTOTCA 


LG9 


15948 




MdCHI 


CN946541 


GGGATAACCTCGCGGCCAAA 


GCATCCATGCCGGAAGCTACAA 


LG1 


16132 




MdF3H 


CN491664 


TGGAAGC1TGTGAGGACTGGGGT 


CTCCTCCGATGGCAAATCAAAGA 


LG5 


20985 




MdDFR 


AF 117268 


GATAGGGmGAGTOAAGTA 


TCTCCTCAGCAGCCTCAG^CT 


LG12 


21890 




MdANS 


AF 117269 


GATGAAGGGAGGCTGGAGAAAG 


GTGGAGGATGAAGGTGAGTGC 


LG6 


13776 




MdFLS 


EB1 37300 


TCAGATGGAGATAATGAGCAATGGAAA 


ATOACGGGGTOACAAGCTGTGG 


LG8 


15333 




MdANR 


EB1 25405 


TCGCTGGCTOTGATCCTCCTG^ 


CCGmGCCAAACTCAGCAAA™ 


LG5 


2243 




MdHCTchr9 


MDP0000851389 


CGATGCTG 1 1 1 ICAGAACCA 


gcagcagacgaggatga™ 


LG9 


24590 




MdC3Hchr8 


MDP0000466557 


CAAAGGAGGTGCTCAAGGAG 


TGGACTCGACCATAGCAGTG 


LG8 


29024 




MdF3'Hchr6 


MDP0000539956 


ACTCTC1TCATGCGC1TGGT 


TGCCTATCCTCACCCAAAAG 


LG6 


22805 




MdF3'Hchr14 


MDP0000370951 


ACCATOAACCCCAACAACG 


ATCACGGmGGAGCTOTG 


LG14 


27562 




MdUFGT 


AF1 17267 


AAGGTCTCTCCAATGTACGAAT 


AGGAGmGTOACmGGACT 


LG1, LG7 


29053, 26292 


TF genes at mQTL 


MdMYB1361 


MDP0000375685 


CTGGGGGTOAAGTAGTCCA 


CTCCGTGGTGGOTGATAAT 


LG16 


1361 


hnt^nnt 


Mdb-HLH1967 


MDP0000261293 


GATACGGCATCATOCTGCT 


GCCTGAGGAmCCAACAAA 


LG16 


1967 




Mdb-HLH1881 


MDP00001 54272 


CTCAACCGGGACTOTCCAA 


GCTCATCCTCCCACACAm 


LG16 


1881 




Mdb-HLHl 543 


MDP00003 19726 


GAGCTGAAACGCCAAAC1TC 


CGGTGATGAACAACACGTO 


LG16 


1543 




MdAP21480 


MDP0000939633 


GCACOTCAACGAAGAGGAC 


GAC1TGGAGTGGGAGCTCAG 


LG16 


1475 




MdG2L61440 


MDP0000202657 


AGACCGACTCCAACAATOG 


GGACTGGTGGTGAGACCTGT 


LG13 


2702 




MdbZIP1380 


MDP0000250967 


CTGmCTGGCAAAGGOTC 


CCATCAACATOCAGTGGAC 


LG16 


1376 


Transcription 


MdCOL1220 


MDP0000185616 


TGA^ATGGGGTGCCAAT 


TAATCACCGCCTCGTAATCC 


LG16 


1224 


fartnr npnp^ 

outside the 


Mdb-HLHl 080 


MDP0000725991 


GGCCAATGACACCTCCmA 


TGAGCTGTGGAATGAGCAAC 


LG16 


1084 


mQTL hots pot 


MdMYB1070 


MDP0000659260 


ACTCCGCAAGAACAGCTCAT 


GCTGTOGACTCGATGTOA 


LG16 


1058 




MdC2H21020 


MDP00001 83099 


CCTCCTCACCTCCTCTCTCC 


CCCGGCTCTGTOTAGTACC 


LG16 


1021 




MdC2H21000 


MDP0000283750 


ATOAGCAAGTOGGTGTCC 


mGCmGTGCAGTOAGG 


LG16 


1003 




MdMyb5o.A 


MDP0000791870 


GGGGAGGAGGAAATGAAGAG 


CAGAGTCCCAGCCAAATG^ 


LG3 


967 




Mdb-HLH33 


MDP0000309179 


GGAGACATCAAAACCCGAAA 


TGAAGGACATGCAAAGCAAG 


LG15 


37144 




MdTTGl 


MDP0000906307 


GACCCGGATACCCmCAAT 


AAACTCGCTGGTC1TGCTGT 


LG1 


28763 



Table 1 Genes that were included in the expression analysis (Continued) 



Mdb-HLH3 


MDP0000225680 


GTCGCCATOGTAAGGCTAA 


CCACCGTGGTCTCAATO^ 


LG11 


32877 


MdMyb9 


MDP0000210851 


GGCCACTAGGTOACCAAAA 


ATCATCGCAGCCAAAGTOT 


LG8 


9430 


MdMybllA 


MDP0000437717 


TGAAGGTCGCTCATGTOTG 


ATOACCGCCTGGmAGTG 


LG13 


30373 


MdGAPDH 


CN494000 


GCTGCCAAGGCTGTOGAA 


ACAGTCAGGTCAACAACGGAAAC 


LG16, LG13 


5636, 7648 



17 putative structural genes of the phenylpropanoid pathway, seven putative transcription factor genes in the genetic window of the mQTL hotspot on LG16, and 11 putative transcription factor genes outside this 
mQTL hotspot were studied. MdGAPDH was used as a reference gene. The last four digits in some transcription factor genes represent their positions on the genome in kbp. 
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sequences are given in Table 1. The primers were tested 
using q RT-PCR in the same way as explained by Khan 
et al [14]. The qRT-PCR products were checked for 
quality by checking their clear single peak in the melting 
curve and a clear band of the expected amplicon size on 
1.5% agarose gel. 

In view of the importance of MdLARl, two primer 
pairs were designed for this gene in two different non- 
overlapping regions. By means of these primer pairs, the 
two fragments were amplified and sequenced for each 
genotype class. This was done for verification of the 
gene specificity of the primers, since MdLARl on LG16 
and MdLAR2 on LG13 (Table 1) show 62% similarity at 
the nucleotide level. The sequences of MdLARl and 
MdLAR2 in 'Prima x 'Fiesta showed good alignment 
with sequences from cv. 'Golden Delicious' on which the 
primers for qRT-PCR were designed. 

Performing q RT-PCR and data analysis 

Gene expression was measured using Fluidigm Dynamic 
Array integrated fluidic circuits for cDNA samples from peel 
and flesh for the genotypes and development stages men- 
tioned in Additional file 1. Fluidigm used the BioMark™ Sys- 
tem and Evagreen DNA binding dye (http://www.fluidigm. 
com). Three 96x96 Dynamic Arrays of Integrated Fluidic 
Circuits, comprising 48 primer pairs in two replicates were 
used. The q RT-PCR set up for the reference gene and other 
control samples, and data analysis was performed as 
described by Khan et al. [14]. 

Correlation network analysis 

The correlation coefficients were calculated between the 
contents of seven metabolites representative for different 
branches of the phenylpropanoid and flavonoid pathway, 
and for the expression of 18 structural genes and 18 
transcription factors possibly involved in these pathways. 
Before calculation of the correlation coefficients, the 
data were 10 log transformed for normalisation purposes. 
Scatter plots were made between the different 10 log 
transformed variables, in order to make sure that out- 
liers did not bias correlation values, and to check the 
distributions. 

Visualization of the correlation network was performed by 
the Pajek software package (http://pajek.imfm.si/doku.php). 
Besides a biological quantitative pattern which is observed 
in a set of samples as the result of physiological processes, 
data may have a particular embedded experimental pattern' 
which is due to the experiment performance, such as extrac- 
tion errors and measurement or calibration errors. So, differ- 
ent analytical methods run on the same set of samples may 
give different experimental patterns. Therefore, correlations 
between variables observed within particular experiments 
may be stronger than correlations between variables from 
different experiments. Here we have a correlation matrix of 



two different experiments and, therefore, three types of 
correlations (sub-matrices) are present: gene-to-gene cor- 
relations, metabolite-to-metabolite correlations and gene- 
to-metabolite correlations. Lower correlation coefficients 
might be expected in the third sub-matrix due to interfer- 
ence of different experimental patterns. To compensate for 
this effect and to obtain a balanced correlation network we 
standardized correlation coefficients separately for each of 
the three sub-matrices. For these a maximum positive and 
negative correlation coefficients r were found in each sub- 
matrix and then were set to 1.0 and -1.0, respectively. Other 
correlation coefficients of each sub-matrix were expressed 
relative to their maximum ones. The standardized correl- 
ation coefficients are further denoted as r s . 

Results 

Association between expression of structural genes of the 
phenylpropanoid/flavonoid pathways and concentrations 
of metabolites that mapped at the mQTL hotspot 

None of the 17 structural genes of the phenylpropanoid 
and flavonoid pathways which were evaluated for gene 
expression, showed a significant correlation with the 
content of procyanidin dimer II, except for the MdLARl 
gene (Figure 3). This was observed both in peel and flesh 
tissues and at the three different fruit developmental 
stages (Additional file 3). For MdLARl we evaluated the 
expression using two different primer pairs, annealing at 
different places in the MdLARl gene. For both primer 
pairs, the measured expression showed a positive correl- 
ation with procyanidin and other phenolic metabolites 
that mapped in the mQTL hotspot (Figure 3). However, 
the metabolites quinic acid and coumaroyl hexoside 
appeared to have a negative correlation with the MdLARl 
expression (Figure 3). 

The progeny that had inherited the recessive alleles for 
low procyanidin dimer II content (mm), showed a low ex- 
pression of MdLARl throughout fruit development, both 
in peel and flesh (Figure 2). However, the heterozygous 
group (Mm) showed a higher expression, compared to the 
homozygous recessive (mm) group, whereas the homozy- 
gous dominant progeny (MM) with high content of pro- 
cyanidin dimer II showed the highest expression of 
MdLARl (Figure 2). The expression level of MdLARl was 
highly significantly, positively correlated with procyanidin 
dimer II content, according to Students t-test (P < 0.1%). 
On the average, the AIM genotypes had a four times higher 
content of this metabolite at the ripe stage compared to 
the mm genotypes (Figure 1), both in peel and flesh. 

No significant correlation was detected between tran- 
script abundance of the other evaluated genes at the one 
hand with the concentration of procyanidin dimer II at 
the other hand (Figure 3). 

The transcript abundance of MdLARl was also signifi- 
cantly correlated with the other metabolites that mapped 
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Figure 3 Correlations between the expression levels of genes and metabolite contents in peel. The different groups of genes have different 
colors. Grey: structural genes of the phenylpropanoid pathway. Blue: the putative transcription factor genes that were detected in the mQTL hotspot; 
Yellow: the putative transcription factor genes that were outside the mQTL hotspot but on LG16; Pink: putative transcription factor genes of the 
phenylpropanoid pathway that are not located on LG16 but are at other regions of the genome. The metabolites are highlighted in purple and their 
correlations with the expression levels of all the genes are given in bold text. The metabolites contents (ions count per second) and expression data 
(expression relatively to the reference gene MdGAPDH) were 10 log transformed before calculation of the correlation coefficients. Scatter plots (data not 
shown) were made to evaluate the distributions and to exclude outliers that might strongly affect the correlations. The correlations between the 
expression level of MdLARl and metabolite content are given in bold italics text. Green color shows other correlations (r > 0.7). Red color highlights 
negative correlations between -0.5 and -1.0. For each metabolite group that had an mQTL at the LG16 hotspot, a representative metabolite is 
included in this correlation matrix. Similar results were also found in flesh (Additional file 3). 



at the LG16 hotspot (Figure 3). However, the other stud- 
ied genes did not show this high correlation with any of 
the metabolites at the mQTL hotspot. This suggests that 
the MdLARl gene is the gene controlling the mQTL of 
procyanidin dimer II, and of all other phenolic com- 
pounds that mapped at this hotspot on LG16. 

Association between expression of transcription factor 
genes and concentrations of metabolites that mapped at 
the mQTL hotspot 

The finding that compounds from different locations in 
the pathway mapped at the same mQTL hotspot [1] 
could suggest that a transcription factor was involved in 
the mQTL. At the mQTL locus, seven candidate tran- 
scription factor genes were identified (Table 1). However, 
there was no clear correlation for any of these candidate 
transcription factor genes with the procyanidin dimer II 
content in peel and flesh (Figure 3). This indicates that 
the evaluated transcription factor genes at the mQTL 
hotspot were not responsible for this hotspot. 

In addition, 11 more candidate transcription factor 
genes were identified throughout the genome, or on 
homology to known transcription factors involved in the 
phenylpropanoid and flavonoid pathway (Table 1). No 
clear correlation was found between the expression of 
any of these putative transcription factor genes and the 



metabolites that mapped at the hotspot (Figure 3). This 
indicates that transcription factor genes outside the 
mQTL hotspot were not controlling this hotspot either. 

Associations between expression of structural genes and 
transcription factor genes 

The correlation matrix (Figure 3) shows that the expres- 
sion levels of many genes were correlated to one an- 
other. As an example, the expression of the structural 
genes MdPAL, MdC4H, MdLIFGT, MdCHS, MdCHI, 
MdF3H, MdDFR, MdLAR2 and MdANS were positively 
correlated to one another. This cluster of structural 
genes showed also a positive correlation with the expres- 
sion of the three transcription factor genes b-HLH1543, 
MdMybll.A and MdMyb9. This suggests that these 
three transcription factor genes may regulate this cluster 
of structural genes, but did not control the mQTL hot- 
spot on LG16. 

We visualized the correlations of Figure 3 in a network 
(Figure 4). This network is divided into two clusters. 
The first cluster shows the correlations between the 
metabolites that mapped at the mQTL hotspot of LG16. 
The green lines in this cluster show positive correlations, 
and the red lines negative correlations. These colours 
resemble the mapping results, depicted in Figure 5. 
Striking in this cluster of metabolites is the presence of 
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one gene only, i.e. MdLARI. This gene is located in the 
centre of the mQTL hotspot of these metabolites. 

In the second cluster of the network, many structural 
genes and transcription factors appear to be connected to 
one another (Figure 4). Several genes in the network are 
important nodes, and are connected to many other genes. 
This is especially the case for MYB transcription factors, 
such as MdMYB9, MdMYBll_A, and MdMYB5a_A, and 
for b-HLH transcription factors, such as b-HLH1881, 
b-HLH1967, and MdbHLH33 (Figure 4). Probably these 
transcription factors regulate many structural genes in the 
phenylpropanoid pathway. However, none of these tran- 
scription factor genes is directly connected to metabolites 
in the first cluster. In spite of the important regulatory 
roles of the mentioned MYB and b-HLH transcription fac- 
tor genes in the phenylpropanoid and flavonoid pathway, 
they were not responsible for the mQTL hotspot. 

Discussion 

Aim of the study 

In our previous study [1] we mapped phenolic compounds 
in ripe fruits of a segregating Fl population derived from 
the cross between cultivars 'Prima and 'Fiesta. There 
appeared to be a strong hotspot of mQTLs at the top of 



LG16. Annotation of the metabolites showed that the 
compounds that mapped on the LG16 hotspot belong to 
the phenylpropanoid and flavonoid pathways (Figure 5). 

We wanted to discover which gene(s) controlled this 
mQTL hotspot. Therefore, in the present research, tran- 
script abundances for the candidate genes in the mQTL 
region were measured in progeny genotypes that segre- 
gated for these mQTLs. In addition, structural genes of 
the phenylpropanoid and flavonoid pathways and puta- 
tive transcription factor genes that are candidates for 
regulating these pathways and located elsewhere were 
evaluated as mentioned in the Methods section in 
detail. 

MdLARI seems to be the only gene that can explain the 
mQTL hotspot on LG16 

As shown in Figure 3, MdLARI was the only gene for 
which the expression was clearly correlated with the me- 
tabolite content, both in peel and flesh. None of the 
other genes showed a clear correlation with procyanidin 
dimer II content. Moreover, Figure 6 shows clearly that 
the expression of MdLARI was low for the genotypes 
that had inherited the recessive alleles (mm), and had 
low content of the representative metabolite procyanidin 
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dimer IL The progeny that had inherited one or two 
dominant alleles (Mm, MM) had higher expression 
levels of MdLARl and higher content of procyanidin 
dimer II (Figure 6). This pattern was observed both in 
peel and in flesh. This was not the case for any of the 
other genes studied, which suggests that MdLARl was 
responsible for the hotspot of mQTLs on LG16. Further- 
more, it indicates that MdLARl exerted its influence by 
means of its expression level. Recent findings in grape 
also showed a genetic association between a LAR gene 
and a procyanidin QTL [15]. 

The procyanidin content was higher in the flesh com- 
pared to the peel (Figure 1). However, the expression of 
MdLARl was lower in the flesh compared to the peel 
(Figure 6). A possible explanation is the fact that flavo- 
nols and anthocyanins are produced in the peel only. 
These may compete for the pool of available substrates, 
leading to relatively lower procyanidins level. 

How can MdLARl explain the observed mQTLs? 

The MdLARl gene clearly explains the mQTL for procya- 
nidin content, as LAR from leguminosal species has been 
implicated in the synthesis of catechin, a building block 



for procyanidins [11]. Remarkably, we found several 
mQTLs in the same hotspot on LG16 for metabolites 
(kaempferol glycosides, phloridzin, phenolic esters) that 
are synthesized by different branches from the phenylpro- 
panoid pathway [1] (Figure 5). Since LAR is not known to 
be involved in the biosynthesis of these other metabolites, 
the observed differential LAR expression does not provide 
a straightforward explanation for the presence of the 
mQTLs of these more upstream metabolites. 

One could speculate about the effect that LAR overex- 
pression may have effect on the total flux through the 
phenylpropanoid pathway. We note that the posi- 
tively associated mQTLs (procyanidins, dihydrochal- 
cones, phenolic esters and kaempferol glycosides) all 
map downstream of coumaroyl-CoA ligase (4CL) in 
the pathway (Figure 5). A metabolite that maps up- 
stream of 4CL is coumaroyl hexoside, for which the 
level was negatively correlated with e.g. procyanidins. 
This appears also from Figure 3. 

In apple, no 4CL-like gene is located at the mQTL 
hotspot [12]. Moreover, the expression of the tested 4CL 
gene did not correlate with the metabolites that mapped 
at the hotspot. One explanation may be that MdLARl 
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overexpression relieves a feedback mechanism on the 
enzymatic activity of 4CL. 4CL is known to be feedback 
inhibited by metabolites from the phenylpropanoid path- 
way, such as naringenin [16]. Possi bly, the enhanced 
MdLARI activity will lead to depletion of pathway inter- 
mediates such as naringenin, which may thus activate 
4CL activity and lead to a higher general flux, from cou- 
maroyl glycoside towards the downstream metabolites. 
The support for such a mechanism needs extensive ex- 
perimentation, which is outside the scope of this article. 

An unlikely, but still possible alternative explanation 
for the mQTL hotspot could be that a transcription fac- 
tor at the mQTL hotspot regulated the expression of 
MdLARI. As we did not see any differencial expression 
of the transcription factor genes at the mQTL hotspot, 
the different alleles of that transcription factor gene 
would not differ in expression levels, but theoretically 
could differ in effect of the protein. Further, that tran- 
scription factor might have influenced 4CL paralogous 
that were not covered by the used primer pair. We do 
not regard this as a likely explanation, but it cannot be 
completely excluded. 

Transcript abundances of several structural genes and 
transcription factor genes were correlated 

MdANR also contributes to the synthesis of procyanidins 
(Figure 5). The expression level of this gene significantly 
correlated with expression of several structural genes 
such as PAL, CHS, DFR, and ANS (Figures 3 and 4). 



Moreover, there was a clear correlation between the 
expression of these structural genes, and the expression 
of the transcription factor genes MYB9 and MYB11 
(Figures 3 and 4). Possibly, these transcription factors 
regulated the mentioned structural genes. However, the 
transcript abundances of none of these structural or 
transcription factor genes did correlate significantly with 
the metabolite abundances that mapped at the mQTL 
hotspot on LG16 (Figure 3). This indicates that these 
structural genes were not the bottleneck for the path- 
way, whereas probably MdLARI was the limiting factor 
in the progeny that had inherited both lowly expressed 
alleles of this gene {mm). Presumably, the bottleneck 
was (partly) removed in case of presence of one or two 
higher expressed alleles of MdLARI (MM, Mm). 

Applications 

The dominant allele of the MdLARI gene, causing 
increased content of metabolites that are potentially health 
beneficial, could be used in marker assisted selection of 
current apple breeding programs. This selection could be 
made at seedling stage. This would reduce the production 
costs for the breeders by discarding the undesired seedlings 
at earlier stage of growth, whereas in classical breeding only 
after six years, when trees start to bear fruits, selection on 
fruit content is possible. Another possibility is to clone the 
dominant allele or alleles for engineering increased content 
of metabolite(s) into existing apple cultivars by different 
transformation technologies including cisgenesis [17,18]. 
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Conclusions 

Our results indicate that MdLARl is the most likely can- 
didate gene responsible for the mQTL hotspot for phen- 
olic compounds on LG16 of apple, both in peel and 
flesh. Increased levels of metabolites downstream of 
MdLARl, such as the flavan-3-ols epicatechin and pro- 
cyanidin dimer II may be directly caused by increased 
transcript abundance of MdLARl, as this gene is known 
to participate in procyanidin biosynthesis. 
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Additional file 1: Genotypes used for measuring relative expression 
of phenylpropanoid and flavonoid pathway genes and the 
candidate genes in the mQTL hotspot. Average sizes of eight fruits per 
tree genotype are also given at each stage. 

Additional file 2: Genotypes used for measuring relative gene 
expression. 'Group A' had low and 'Group B' had high content of 
'Procyanidin dimer II'. The content is given as 10 log transformed values of 
ions count per second. 

Additional file 3: Corretions betwee the expression levels of genes 
and metabolite contents in flesh. The different groups of genes have 
different colours. Grey: structural genes of the phenylpropaniod pathway. 
Blue: the putative transcription factor genes that were detected in the 
mQTL hotspot; Yellow: the putative transcription factor genes that were 
outside the mQTL hotspot but on LG16; Pink: putative transcription factor 
genes of the phenylpropanoid pathway that are not located on LG16 but 
are at other regions of the genome. The metabolites are highlighted in 
purple and their correlations with the expression levels of all are given in 
bold text. The metabolites contents (ions count per second) and 
expression data (expression relatively to the reference gene MdGAPDH) 
were 10 log transformed before calculation of the correlation coefficients. 
Scatter plots (data not shown) were made to evaluate the distributions 
and to exclude outliers that might strongly affect the correlations, the 
correlations between the expression level of MdLARl and metabolite 
content are given in bold italics text. Light green colors shows other 
correlations (r > 0.7) between the expression levels of different genes 
and also between the expression of different genes and metabolites 
content. For each metabolite group that had a mQTL at the LG14 
hotspot, a representative metabolite is included in this correlation matrix. 
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